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Introduction 

The  general  lack  of  transparency  in  energy  system  modelling  leads  to  irreproducible  results 
and  the  loss  of  traceability  in  decision  making,  which  contradicts  the  general  requirements 
for  good  scientific  work.  In  2012,  the  report  of  the  EU-funded  project  ATEsT  stated: 
"increasing  the  transparency  in  supporting  of  the  EU  RD&D  policy  initiatives  was  suggested 
as  an  overall  action."  [ATEsT  2012],  Beside  of  leading  to  incomprehensible  policy  this  lack  of 
transparency  is  an  important  obstacle  to  the  scientific  debate  on  energy  system  modelling.  A 
profound  discussion  about  the  strengths  and  weaknesses  of  the  existing  models  and  which 
questions  can  or  cannot  be  answered  by  them  is  hampered.  The  openmod  initiative1  writes 
in  their  manifesto:  "We  believe  that  more  openness  in  energy  modelling  increases 
transparency  and  credibility,  reduces  wasteful  double-work  and  improves  overall  quality." 
[openmod].  In  an  article  [Pfenniger  2017]  states  for  further  specification  of  the  problem: 
"Sharing  a DNA  sequence  in  an  established  format  is,  of  course,  easier  than  sharing  the 
unstructured  assumptions  behind  a techno-economic  scenario  study,  for  which  no  standard 
format  exists  yet.  So  the  energy  community  must  decide  on  standards  for  sharing  code,  data 
and  assumptions."  This  paper  picks  up  the  thread  and  aims  to  contribute  to  a 
standardisation  process  of  sharing  information  about  energy  system  analyses.  Light  is  shed 
on  the  gap  between  opening  up  and  reaching  transparency  or  even  reproducibility.  The  latter 
is  defined  according  to  the  rules  of  the  German  Research  Society  (DFG)  for  good  scientific 
practice.  Transparency  can  thus  be  seen  as  a necessary  precondition  for  reproducibility. 
Both  transparency  and  reproducibility  are  just  a part  of  the  criteria  for  the  quality  of  models  or 
data.  In  the  present  paper  the  focus  is  set  on  transparency  and  reproducibility.  The  research 
is  based  on  an  in-house  application  that  can  be  generalised.  The  main  goal  is  to  document 
the  applicability  of  existing  tools  and  open  it  up  for  further  discussions.  Some  of  the 
evaluated  tools  as  well  as  the  suggestions  for  improvement  were  designed  through  to  cross- 
institutional  collaboration  and  expert  workshop  which  were  a fundamental  part  of  the  on- 
going research  projects  open_eGo  [open_eGo]  and  open_FRED  [open_FRED].  The  general 
framework  outlined  above  leads  to  the  following  research  questions  that  we  want  to  address: 

What  is  the  status-quo  of  transparency/reproducibility  in  the  selected  case  study? 

Which  areas  need  improvements? 

How  can  these  improvements  look  like? 


Methodology 

At  the  outset,  the  necessity  for  a clear  and  intelligible  definition  of  key  terms  is  addressed.  In 
contrast  to  the  usually  separate  sections  for  abbreviations  and  definitions  in  each  publication 
a collective  public  documentation  was  started.  A shared  glossary  has  the  aim  to  develop  a 
common  understanding  of  terms.  This  way  the  different  interpretations  and  applications 
become  apparent  and  can  be  discussed  openly.  In  addition  to  the  definitions,  supplementary 


1 open  energy  modelling  initiative:  an  international  grass  root  initiative  of  modellers  from 
various  organisations 
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information  (abbreviations  and  sources)  and  relations  (synonyms  and  subterms)  augment 
the  concept  towards  an  ontology.  A German  equivalent  can  be  found  on  the  central 
information  system  EnArgus  [BMWi].  Following  this,  the  options  to  use  different  types  of 
licenses  for  data,  software  and  artwork  are  presented. 

The  main  task  is  the  application  of  a transparency  checklist  as  it  is  described  in  [Cao  2016] 
to  a case  study  and  carry  out  a critical  self-observation  in  order  to  develop,  present  and 
evaluate  standardised  tools.  In  addition  to  the  published  report,  all  accompanying  materials 
that  are  publicly  available  are  assessed  critically.  This  comprises  the  compiled  input  data 
including  assumptions,  the  created  energy  system  model  and,  of  course,  the  resulting  data. 
The  transparency  assessment  of  the  interpretation  of  the  results  as  well  as  of  the  finding  of 
conclusions  is  not  part  of  this  investigation  (see  Figure  1). 


Figure  1:  Simplified  model  flow.  Figure  by  Ludwig  Hulk  / CC  BY  SA  4.0 


The  ideas,  templates  and  tools  developed  in  the  above-named  research  projects  are 
presented.  These  are  the  multi-level  and  linked  “fact  sheets”,  the  “OpenEnergy 
Database”(oedb)  and  the  “scenario  log”.  Central  idea  of  the  factsheets  is  to  provide  a 
standard  online  form  to  simplify  the  input  of  information,  to  provide  a clearly  arranged 
representation  and  to  facilitate  the  public  availability.  The  developing  process  of  the  fact 
sheets  included  an  evaluation  of  the  transparency  checklist,  the  list  of  open  models  from  the 
[openmod-wiki]  and  other  materials  and  contributions  from  different  institutes.  In  two  expert 
workshops  and  in  online  discussions  the  fact  sheets  have  been  improved  and  divided  into 
fact  sheets  for  frameworks,  models,  and  scenarios. 

The  three  kinds  of  factsheets  were  implemented  as  part  of  the  OpenEnergy  Platform  - an  in- 
progress component  of  the  open_eGo  project  - and  parts  of  it  have  also  been  added  to  the 
wiki  of  the  openmod-initiative.  The  search  and  filter  functionalities  of  model  and  framework 
factsheets  yield  huge  improvements  regarding  inter-project  communication  and  possible  and 
intended  code  reuse. 


Figure  3:  UML  class  diagram  of  fact  sheets.  Figure  by  Martin  Glauer  / CC  BY  SA  4.0 

External  developers  get  an  overview  on  existing  functionalities  and  methods  which  they  can 
reuse  with  respect  to  the  corresponding  licenses.  In  this  way  double  work  is  avoided,  since 
typical  functions  used  in  energy  system  models  do  not  need  to  be  developed  over  and  over 
again.  Additionally,  mistakes  are  eliminated,  because  different  model  developer  and  users 


2/7 


work  with  the  code  and  can  report  bugs  and  help  debugging.  Users  that  are  inexperienced 
and  new  in  the  field  of  energy  modelling  get  a quick  overview  on  existing  frameworks  and 
models  that  can  be  used  for  their  purposes  and  immediate  information  on  the  data  sets  that 
might  be  used  and  publications  addressing  said  tools. 

Whilst  model-  and  framework  factsheets  are  rather  technical  and  mostly  interesting  for 
programmers,  scenario  factsheets  tackle  some  of  the  introductory  mentioned  flaws  in  current 
scientific  energy  modelling.  The  given  meta  information  illuminates  the  necessary 
assumptions  that  lead  to  specific  results.  Scenario  factsheets  contain  information  not  only  on 
used  data  and  models,  but  also  on  the  parameter  configuration  they  were  used  with.  This 
yields  the  possibility  to  get  an  overview  on  scenarios  that  used  similar  assumptions,  which 
might  allow  conclusions  on  the  impact  of  specific  parameters  influencing  the  results  of 
energy  models. 

The  OpenEnergy  Database  will  not  play  an  important  role  in  this  paper.  In  the  projects,  it  is 
used  to  exchange  and  publish  data  but  there  are  different  possibilities  to  give  access  to  data. 
One  strength  of  the  oedb  is  to  provide  metadata  which  increases  the  quality  of  the  data. 

A supplementary  method  of  documentation  is  presented  that  was  developed  to  improve 
reproducibility.  Comparable  to  a kind  of  logbook,  the  “scenario  log”  records  work  steps  and 
data  flows.  The  implemented  functions  aim  at  high  automation  while  following  the  basic 
principles  of  good  scientific  practice;  this  can  be  seen  as  an  adapted  edition  of  the  well- 
known  laboratory  notebook. 

As  case  study,  a current  research  assignment  is  selected,  using  a comparatively  simple 
model  and  a reduced  scenario  frame.  The  regional  association  of  the  German  green  political 
party  Bundnis  90/Die  Grunen  has  commissioned  a research  study  in  order  to  assess  the 
impact  of  the  energy  strategy  2030  [MWE  2012]  of  the  federal  state  Brandenburg.  The  main 
focus  was  the  investigation  of  changes  in  the  political  and  economic  conditions  regarding 
lignite  utilisation  and  the  fulfillment  of  the  C02  reduction  targets.  The  approach  was  to  build  a 
model  of  the  local  energy  system  considering  heat  and  electricity  demand  and  generation.  It 
is  based  on  the  open  energy  modelling  framework  (oemof)  [oemof  2016]  and  is  called  oemof 
app  Berlin  Brandenburg  (abbb).  Different  types  of  heat  and  power  generation,  the  projected 
consumptions  and  the  expected  export  of  electrical  energy  are  taken  into  account  to 
determine  load  flows  in  and  between  the  five  subregions  and  the  surrounding  export  regions 
on  an  hourly  basis.  In  three  scenarios  carbon  dioxide  emissions,  the  utilisation  of  existing 
power  plants  as  well  as  of  the  existing  grid,  have  been  analysed  to  examine  the  viability  of 
the  energy  strategy.  The  final  report  aims  at  weighing  up  between  a technical 
documentation,  simple  comprehension  for  non-experts,  clear  recommendations  for  action 
and  maximal  transparency. 


Results 

Though  the  glossary  is  not  yet  used  very  widely  it  provides  a possibility  to  refer  to  public 
available  definitions.  All  terms  used  in  the  fact  sheets  can  be  clarified  there.  Examples  for 
terms  with  diverging  definitions  which  have  been  discussed  along  the  projects  are 
“scenario”,  “endogenous”-  and  “exogenous”  parameters,  “processed  data”  or  “secondary 
data”. 

One  crucial  point  that  appears  when  the  data  shall  be  opened  up  for  common  use  is  the 
question  of  licenses.  Making  data  publicly  available  doesn’t  mean  automatically  that  it  can  be 
used.  A license  clarifies  the  conditions  under  which  data  or  code  can  be  used.  In  the 
absence  of  a license,  the  author  retains  proprietary  copyright.  Standard  licenses  provide  pre- 
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defined  sets  of  standard  conditions.  The  most  common  licenses  for  a given  artefact  can  be 
determined  by  its  type:  code,  data,  documentation  or  other  generic  digital  "creative  work"  like 
figures  and  maps.  For  any  given  project,  these  components  can  and  should  be  licensed 
independently  by  type.  At  the  most  basic  level,  one  must  decide  on  whether  to  use  a copyleft 
license  or  a more  permissive  license.  While  copyleft  assures  that  code  changes  by  any 
future  contributors  must  stay  public,  permissive  licenses  only  require  attribution  in  derived 
works. 

The  central  tool  applied  in  this  study  is  the  transparency  checklist  [Cao  2016].  It  is  structured 
into  ‘General  Information’,  ‘Empirical  Data’,  ‘Assumptions’,  ‘Modeling’,  ‘Results’,  and 
‘Conclusions  and  Recommendations’.  Evaluating  the  transparency  checklist  and  applying  it 
to  a study  lead  to  two  additional  sorts  of  categorisation  of  the  questions.  The  development  of 
standardised  fact  sheets  resulted  in  the  categorisation  of  framework-,  model-  and  scenario 
factsheets  to  avoid  redundant  information.  Furthermore  a differentiation  of  the  questions  into 
“general  transparency”-,  “reproducibility”-  and  “quality”-related  questions  has  been 
implemented.  “General  transparency”  includes  facts  that  are  not  necessary  for  reproducibility 
as  e.g.  name  of  author,  licence,  etc..  “Reproducibility”  are  those  facts  of  transparency  that 
are  necessary  to  reproduce  the  work.  Whereas  “quality”-related  questions  are  including 
those  questions  that  are  evaluating  the  quality  of  the  study  and  are  not  related  to 
transparency.  Especially  this  differentiation  seems  important  because  making  a model  or 
scenario  simulation  transparent  doesn’t  mean  that  it’s  quality  is  high,  it  just  allows  to 
evaluate  the  quality. 

To  apply  the  checklist  to  our  study  we  used  all  documents  and  data  that  will  be  available 
after  its  publication  including  the  report,  the  database  tables  and  the  fact  sheets.  The  letter 
were  the  most  important  sources  of  information.  Links  to  the  material  will  be  published  on  the 
[RLI-project-page]. 

The  full  list  of  questions  with  categorisation  and  the  evaluation  of  our  study  can  be  found 
through  to  [checklist-categories].  Table  1 shows  the  results  of  the  evaluation.  The  first 
column  (“field”)  comprises  the  original  categories  of  the  transparency  checklist.  The  columns 
“T”,  “R”  and  “Q”  declare  how  many  questions  of  this  category  have  been  assigned  to 
transparency,  reproducibility  and  quality  respectively.  The  “Score”-columns  show  how  many 
of  the  questions  of  this  category  could  be  answered  with  the  publicly  available  information. 


Table  1 : Results  of  the  application  of  the  checklist  per  category 


Field 

T 

Score 

R 

Score 

Q 

Score 

Total 

General  Information 

6 

83% 

0 

- 

2 

100% 

88% 

Empirical  Data 

0 

- 

2 

100% 

1 

100% 

100% 

Assumptions 

3 

1 00% 

2 

100% 

6 

33% 

64% 

Model  exercise 

6 

1 00% 

5 

80% 

3 

67% 

86% 

Results 

0 

- 

2 

0% 

2 

50% 

25% 

Conclusions  and 

Recommendations 

0 

- 

0 

- 

5 

20% 

20% 

sum  / average 

15 

93% 

11 

73% 

19 

47% 

45  / 69% 

The  evaluation  shows  that  only  one  question  concerning  general  transparency  (1.2  “What 
are  the  authors’  contributions  in  detail?”)  couldn’t  be  answered.  The  development  of  the 
model  code  can  be  tracked  in  the  version  control  of  GitFlub.  Other  parts  like  data  processing 
and  writing  of  the  final  report  are  not  documented  openly  yet.  Concerning  the  category 
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reproducibility,  the  score  of  73%  is  also  improvable.  This  is  due  to  missing  openness  of  the 
output  data  and  post-processing  (14.1  “Where  can  one  find  the  numerical  values  (output)  of 
the  model?”).  One  possible  solution  is  an  import  into  the  database.  Quality-related  questions 
are  only  partly  answered.  Most  of  these  questions  are  addressing  the  issues  assumptions, 
uncertainties  and  validation.  This  are  important  issues  that  can  be  answered  in  the 
documentation  or  the  final  report. 

Concerning  reproducibility,  the  transparency  checklist  is  not  yet  sufficient.  It  is  not  evaluated 
if  the  used  data  is  published  and  if  the  published  data  is  clearly  arranged.  Furthermore  the 
process  of  model  implementation  or  linking  models  is  not  evaluated. 

The  modelling  process  is  described  with  the  following  steps  [openmod  manifesto]:  input  of 
raw  data  - data  processing  - model  formulation  - numerical  solver  - model  output  - 
interpretation  (see  also  figure  1).  Opening  it  up  means  for  the  first  two  steps  that  "Open 
Data"  is  requested.  For  the  processing  as  also  for  step  three  and  four  open  source  software 
(scripts,  models,  packages,  etc.)  is  needed  to  open  up.  Step  five  is  not  a big  challenge  to 
open  up  as  model  results  are  usually  published.  The  question  is  if  all  output  data  (before 
post-processing)  should  be  included.  We  left  step  six  "interpretation"  for  a future  work.  We 
would  even  see  it  as  two  steps  "conclusions"  and  "recommendations"  where  the  process 
and  criteria  should  be  defined  and  opened  up. 

Due  to  the  heterogenous  landscape  of  TSOs  and  other  energy  providers  in  europe  as  well 
as  the  decentralised  structure  of  the  scientific  community,  there  is  a vast  amount  of  different 
data  structures  and  sources.  This  raises  the  difficulty  to  agree  upon  a common  basis  to 
compare  models.  Flence,  a common  source  of  energy  data  that  merges  different  data 
models  into  a usable  and  comparable  structure  is  mandatory.  The  Open  Energy 
Database[oedb]  was  developed  to  provide  a place  where  raw  data  and  processed  data  can 
be  published  and  linked  to.  Using  that  for  the  simple  scenario  served  the  expectations 
though  endogenous  data  will  not  be  in  the  database  but  only  in  the  model.  Even  if  the  model 
is  also  open  source  the  used  data  cannot  be  found  in  one  place  which  results  in  less 
transparency.  This  can  be  avoided  by  making  endogenous  data  available  in  the  model  fact 
sheet.  To  really  reference  every  used  value  can  result  in  a very  time  consuming  work  if  a 
number  of  single  cells  of  a table  have  to  be  commented.  The  process  foreseen  for  this  in  the 
oedb  is  still  too  complicated  and  needs  automatisation. 

To  also  open  up  the  process  of  model  implementation  or  even  scenario  simulations  with 
linked  models  we  developed  the  “scenario  log” 
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Figure  2:  Enhanced  model  flow.  Figure  by  Ludwig  Hulk  / CC  BY  SA  4.0 


The  scenario  log  is  based  on  the  idea  of  an  analog  laboratory  notebook  as  organisational 
tool  to  record  different  kinds  of  actions  from  multiple  users.  In  our  case  the  implementation  is 
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done  in  a simplified  database  table  where  entries  are  added  each  time  a script  or  program  is 
being  executed  (see  grey  arrows  in  Figure  2).  From  the  perspective  of  the  database  and  the 
data  flow  there  are  different  possible  actions  like  input  and  output.  In  addition  to  this,  useful 
information  can  be  added  manually  or  is  obtained  automatically  (*),  see  Table  2.  While  there 
are  no  standardised  guidelines  for  documentation,  one  common  criteria  is  the  use  of 
permanent  entries.  This  is  implemented  by  the  user  rights,  which  do  not  allow  deletion  or 
modification  but  only  inserts. 


Table  2:  Description  of  the  scenario  log 


Entry 

Description 

Example 

id* 

Serial  number,  unique  identifier 

1 

version 

Version  number  of  programme  or  script 

vO.1 

action 

Type  of  action  or  data  flow 

setup,  import,  export,  input, 
output 

path 

Database  or  file  path 

schema.table 

source 

Name  of  the  script  or  programme 

egodp_rea_setup.sql 

comment 

Additional  information 

Setup  script 

Name  of  the  database  user 

exampe_user 

timestamp* 

Date  and  time 

2017-02-17  16:20:00 

metadata* 

A copy  of  the  metadata  if  available 

{metastring} 

Conclusion 

Due  to  the  evaluation  of  the  status-quo  of  the  transparency,  we  identified  areas  that  need  an 
improvement.  The  comparison  of  the  checklist  with  the  case  study  shows  that  the  majority  of 
the  transparency  (93%)  and  reproducibility  (73%)  criteria  are  fulfilled.  Only  the  questions 
concerning  the  actual  contributions  of  each  author  and  the  output  data  are  not  answered  yet. 
The  presented  scenario  log  is  one  possible  solution  to  record  individual  work  steps  and  data 
flows  in  order  to  increase  transparency.  Missing  output  data  can  be  imported  into  the 
OpenEnergy  Database  or  can  be  published  elsewhere  online.  Questions  categorised  as 
quality  criteria  are  only  answered  partly  (47%).  Standardised  solutions  to  answer  the  issues 
of  documenting  assumptions,  uncertainties  and  validation  must  be  developed  in  the  future 
works.  The  analysis  shows  that  the  fact  sheets  are  suitable  to  record  the  most  important 
information  and  further  provide  other  useful  functions  like  predefined  answers  and  filters. 

New  processes  and  instruments  have  been  developed  to  improve  transparency, 
reproducibility  and  thus  the  quality  of  energy  system  analyses.  The  overall  concept  of  good 
scientific  practice  should  include  a variety  of  different  tools  and  solutions.  This  should  cover 
different  areas  from  proper  licensing  to  a complete  publication  of  accompanying  materials. 
This  compilation  is  intended  to  be  a further  step  towards  a holistic  view  of  the  scientific  work. 
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